The architecture of the Festival speech synthesis system
نویسندگان
چکیده
We describe a new formalism for storing linguistic data in a text to speech system. Linguistic entities such as words and phones are stored as feature structures in a general object called an linguistic item. Items are configurable at run time and via the feature structure can contain arbitrary information. Linguistic relations are used to store the relationship between items of the same linguistic type. Relations can take any graph structure but are commonly trees or lists. Utterance structures contain all the items and relations contained in a single utterance. We first describe the design goals when building a synthesis architecture, and then describe some problems with previous architectures. We then discuss our new formalism in general along with the implementation details and consequences of our approach.
منابع مشابه
On the First Greek-TTS Based on Festival Speech Synthesis
In this article we describe the first Text To Speech (TTS) system for the Greek language based on Festival architecture. We discuss practical implementation details and we capitalize on the preparation of the diphone database and on the prediction of phoneme duration module implemented with CART tree technique. Two male databases where used for two different speech synthesis engines, namely, re...
متن کاملPhonetic Grammars for Gf
We discuss the problem of using well-known Festival speech synthesis system for languages with no voices/lexicons provided in the current distributions. We consider various parts of the system implemented according to the standard Text-To-Speech (TTS) system architecture. 1. Grammatical Framework grammar formalism for multilingual
متن کاملIntegrating Festival and Windows
Festival is a popular open-source development and execution environment for speech synthesis. It has been well-integrated within many environments, particularly Unix ones, but so far has not been easy to integrate natively into Windows. We present two solutions to this: an MSAPI interface, which allows Festival voices to work with a range of speech-enabled Windows applications, and SpeechServer...
متن کاملDiphone-Based Concatenative Speech Synthesis System for Mongolian
This paper describes the first Text-to-Speech (TTS) system for the Mongolian language, using the general speech synthesis architecture of Festival. The TTS is based on diphone concatenative synthesis, applying TD-PSOLA technique. The conversion process from input text into acoustic waveform is performed in a number of steps consisting of functional components. Procedures and functions for the s...
متن کاملSpeect: a multilingual text-to-speech system
This paper introduces a new multilingual text-to-speech system, which we call Speect (Speech synthesis with extensible architecture), aiming to address the shortcomings of using Festival as a research system and Flite as a deployment system in a multilingual development environment. Speect is implemented in C with a modular object oriented approach and a plugin architecture, aiming to separate ...
متن کاملText To Speech for Bangla Language using Festival
In this paper, we present a Text to Speech (TTS) synthesis system for Bangla language using the opensource Festival TTS engine. Festival is a complete TTS synthesis system, with components supporting front-end processing of the input text, language modeling, and speech synthesis using its signal processing module. The Bangla TTS system proposed here, creates the voice data for festival, and add...
متن کامل